NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Huang, Yixiao; Zhu, Hanlin; Guo, Tianyu; Jiao, Jiantao; Sojoudi, Somayeh; Jordan, Michael I; Russell, Stuart; Mei, Song (September 2025, Conference on Neural Information Processing Systems)

Full Text Available
Generative AI models should include detection mechanisms as a condition for public release

https://doi.org/10.1007/s10676-023-09728-4

Knott, Alistair; Pedreschi, Dino; Chatila, Raja; Chakraborti, Tapabrata; Leavy, Susan; Baeza-Yates, Ricardo; Eyers, David; Trotman, Andrew; Teal, Paul D; Biecek, Przemyslaw; et al (December 2023, Ethics and Information Technology)

Abstract The new wave of ‘foundation models’—general-purpose generative AI models, for production of text (e.g., ChatGPT) or images (e.g., MidJourney)—represent a dramatic advance in the state of the art for AI. But their use also introduces a range of new risks, which has prompted an ongoing conversation about possible regulatory mechanisms. Here we propose a specific principle that should be incorporated into legislation: that any organization developing a foundation model intended for public use must demonstrate a reliabledetection mechanismfor the content it generates, as a condition of its public release. The detection mechanism should be made publicly available in a tool that allows users to query, for an arbitrary item of content, whether the item was generated (wholly or partly) by the model. In this paper, we argue that this requirement is technically feasible and would play an important role in reducing certain risks from new AI models in many domains. We also outline a number of options for the tool’s design, and summarize a number of points where further input from policymakers and researchers would be required.
more » « less
Full Text Available
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

https://doi.org/10.1109/TIT.2022.3185139

Rashidinejad, Paria; Zhu, Banghua; Ma, Cong; Jiao, Jiantao; Russell, Stuart (December 2022, IEEE Transactions on Information Theory)

Full Text Available
MADE: Exploration via Maximizing Deviation from Explored Regions

Zhang, Tianjun; Rashidinejad, Paria; Jiao, Jiantao; Tian, Yuandong; Gonzalez, Joseph E; Russell, Stuart (January 2021, Advances in Neural Information Processing Systems 34 (NeurIPS 2021))

Full Text Available
SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory

Rashidiejad, Paria; Jiao, Jiantao; Russell, Stuart (January 2020, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada)
null (Ed.)
Full Text Available
Learning Plannable Representations with Causal InfoGAN

Kurutach, Thanard; Tamar, Aviv; Yang, Ge; Russell, Stuart; Abbeel, Pieter (January 2018, Neural Information Processing Systems (NeurIPS))

Full Text Available
The Off-Switch Game

https://doi.org/10.24963/ijcai.2017/32

Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart (August 2017, International Joint Conferences on Artificial Intelligence Organization)

It is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off. As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching the system off. This is a challenge because many formulations of rational agents create strong incentives for self-preservation. This is not caused by a built-in instinct, but because a rational agent will maximize expected utility and cannot achieve whatever objective it has been given if it is dead. Our goal is to study the incentives an agent has to allow itself to be switched off. We analyze a simple game between a human H and a robot R, where H can press R’s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H’s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.
more » « less
Full Text Available

Search for: All records